Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 47
Filtrar
1.
PLOS Digit Health ; 3(4): e0000327, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38652722

RESUMO

As the world emerges from the COVID-19 pandemic, there is an urgent need to understand patient factors that may be used to predict the occurrence of severe cases and patient mortality. Approximately 20% of SARS-CoV-2 infections lead to acute respiratory distress syndrome caused by the harmful actions of inflammatory mediators. Patients with severe COVID-19 are often afflicted with neurologic symptoms, and individuals with pre-existing neurodegenerative disease have an increased risk of severe COVID-19. Although collectively, these observations point to a bidirectional relationship between severe COVID-19 and neurologic disorders, little is known about the underlying mechanisms. Here, we analyzed the electronic health records of 471 patients with severe COVID-19 to identify clinical characteristics most predictive of mortality. Feature discovery was conducted by training a regularized logistic regression classifier that serves as a machine-learning model with an embedded feature selection capability. SHAP analysis using the trained classifier revealed that a small ensemble of readily observable clinical features, including characteristics associated with cognitive impairment, could predict in-hospital mortality with an accuracy greater than 0.85 (expressed as the area under the ROC curve of the classifier). These findings have important implications for the prioritization of clinical measures used to identify patients with COVID-19 (and, potentially, other forms of acute respiratory distress syndrome) having an elevated risk of death.

2.
iScience ; 27(2): 108819, 2024 Feb 16.
Artigo em Inglês | MEDLINE | ID: mdl-38303691

RESUMO

Understanding brain response to audiovisual stimuli is a key challenge in understanding neuronal processes. In this paper, we describe our effort aimed at reconstructing video frames from observed functional MRI images. We also demonstrate that our model can predict visual objects. Our method constructs an autoencoder model for a set of training video segments to code video streams into their corresponding latent representations. Next, we learn a mapping from the observed fMRI response to the corresponding latent video frame representation. Finally, we pass the latent vectors computed using the fMRI response through the decoder to reconstruct the predicted image. We show that the representations of video frames and those constructed from corresponding fMRI images are highly clustered, the latent representations can be used to predict objects in video frames using just the fMRI frames, and fMRI responses can be used to reconstruct the inputs to predict the presence of faces.

3.
Liver Transpl ; 30(3): 244-253, 2024 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-37556190

RESUMO

Understanding the prognostic significance of acute kidney injury (AKI) stage 1B [serum creatinine (sCr) ≥1.5 mg/dL] compared with stage 1A (sCr < 1.5 mg/dL) in a US population is important as it can impact initial management decisions for AKI in hospitalized cirrhosis patients. Therefore, we aimed to define outcomes associated with stage 1B in a nationwide US cohort of hospitalized cirrhosis patients with AKI. Hospitalized cirrhosis patients with AKI in the Cerner-Health-Facts database from January 2009 to September 2017 (n = 6250) were assessed for AKI stage 1 (≥1.5-2-fold increase in sCr from baseline) and were followed for 90 days for outcomes. The primary outcome was 90-day mortality; secondary outcomes were in-hospital AKI progression and AKI recovery. Competing-risk multivariable analysis was performed to determine the independent association between stage 1B, 90-day mortality (liver transplant as a competing risk), and AKI recovery (death/liver transplant as a competing risk). Multivariable logistic regression analysis was performed to determine the independent association between stage 1B and AKI progression. In all, 4654 patients with stage 1 were analyzed: 1A (44.3%) and 1B (55.7%). Stage 1B patients had a significantly higher cumulative incidence of 90-day mortality compared with stage 1A patients, 27.2% versus 19.7% ( p < 0.001). In multivariable competing-risk analysis, patients with stage 1B (vs. 1A) had a higher risk for mortality at 90 days [sHR 1.52 (95% CI 1.20-1.92), p = 0.001] and decreased probability for AKI recovery [sHR 0.76 (95% CI 0.69-0.83), p < 0.001]. Furthermore, in multivariable logistic regression analysis, AKI stage 1B (vs. 1A) was independently associated with AKI progression, OR 1.42 (95% CI 1.14-1.72) ( p < 0.001). AKI stage 1B patients have a significantly higher risk for 90-day mortality, AKI progression, and reduced probability of AKI recovery compared with AKI stage 1A patients. These results could guide initial management decisions for AKI in hospitalized patients with cirrhosis.


Assuntos
Injúria Renal Aguda , Transplante de Fígado , Humanos , Prognóstico , Transplante de Fígado/efeitos adversos , Cirrose Hepática/complicações , Cirrose Hepática/diagnóstico , Cirrose Hepática/epidemiologia , Fibrose , Injúria Renal Aguda/diagnóstico , Injúria Renal Aguda/epidemiologia , Injúria Renal Aguda/etiologia , Fatores de Risco , Estudos Retrospectivos
4.
Aliment Pharmacol Ther ; 57(12): 1397-1406, 2023 06.
Artigo em Inglês | MEDLINE | ID: mdl-36883210

RESUMO

BACKGROUND: In patients with cirrhosis and acute kidney injury (AKI), longer time to AKI-recovery may increase the risk of subsequent major-adverse-kidney-events (MAKE). AIMS: To examine the association between timing of AKI-recovery and risk of MAKE in patients with cirrhosis. METHODS: Hospitalised patients with cirrhosis and AKI (n = 5937) in a nationwide database were assessed for time to AKI-recovery and followed for 180-days. Timing of AKI-recovery (return of serum creatinine <0.3 mg/dL of baseline) from AKI-onset was grouped by Acute-Disease-Quality-Initiative Renal Recovery consensus: 0-2, 3-7, and >7-days. Primary outcome was MAKE at 90-180-days. MAKE is an accepted clinical endpoint in AKI and defined as the composite outcome of ≥25% decline in estimated-glomerular-filtration-rate (eGFR) compared with baseline with the development of de-novo chronic-kidney-disease (CKD) stage ≥3 or CKD progression (≥50% reduction in eGFR compared with baseline) or new haemodialysis or death. Landmark competing-risk multivariable analysis was performed to determine the independent association between timing of AKI-recovery and risk of MAKE. RESULTS: 4655 (75%) achieved AKI-recovery: 0-2 (60%), 3-7 (31%), and >7-days (9%). Cumulative-incidence of MAKE was 15%, 20%, and 29% for 0-2, 3-7, >7-days recovery groups, respectively. On adjusted multivariable competing-risk analysis, compared to 0-2-days, recovery at 3-7 and >7-days was independently associated with an increased risk for MAKE: sHR 1.45 (95% CI 1.01-2.09, p = 0.042), sHR 2.33 (95% CI 1.40-3.90, p = 0.001), respectively. CONCLUSION: Longer time to recovery is associated with an increased risk of MAKE in patients with cirrhosis and AKI. Further research should examine interventions to shorten AKI-recovery time and its impact on subsequent outcomes.


Assuntos
Injúria Renal Aguda , Insuficiência Renal Crônica , Humanos , Fatores de Risco , Progressão da Doença , Estudos Retrospectivos , Rim , Insuficiência Renal Crônica/complicações , Cirrose Hepática/complicações , Taxa de Filtração Glomerular
5.
J Hepatol ; 77(1): 108-115, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35217065

RESUMO

BACKGROUND & AIMS: Acute kidney disease (AKD) is the persistence of acute kidney injury (AKI) for up to 3 months, which is proposed to be the time-window where critical interventions can be initiated to alter downstream outcomes of AKI. In cirrhosis, AKD and its impact on outcomes have been scantly investigated. We aimed to define the incidence and outcomes associated with AKD in a nationwide US cohort of hospitalized patients with cirrhosis and AKI. METHODS: Hospitalized patients with cirrhosis and AKI in the Cerner-Health-Facts database from 1/2009-09/2017 (n = 6,250) were assessed for AKD and were followed-up for 180 days. AKI and AKD were defined based on KDIGO and ADQI AKD and renal recovery consensus criteria, respectively. The primary outcome measure was mortality, and the secondary outcome measure was de novo chronic kidney disease (CKD). Competing-risk multivariable models were used to determine the independent association of AKD with primary and secondary outcomes. RESULTS: AKD developed in 32% of our cohort. On multivariable competing-risk analysis adjusting for significant confounders, patients with AKD had higher risk of mortality at 90 (subdistribution hazard ratio [sHR] 1.37; 95% CI 1.14-1.66; p = 0.001) and 180 (sHR 1.37; 95% CI 1.14-1.64; p = 0.001) days. The incidence of de novo CKD was 37.5%: patients with AKD had higher rates of de novo CKD (64.0%) compared to patients without AKD (30.7%; p <0.001). After adjusting for confounders, AKD was independently associated with de novo CKD (sHR 2.52; 95% CI 2.01-3.15; p <0.001) on multivariable competing-risk analysis. CONCLUSIONS: AKD develops in 1 in 3 hospitalized patients with cirrhosis and AKI and it is associated with worse survival and de novo CKD. Interventions that target AKD may improve outcomes of patients with cirrhosis and AKI. LAY SUMMARY: In a nationwide US cohort of hospitalized patients with cirrhosis and acute kidney injury, acute kidney disease developed in 1 in 3 patients and was associated with worse survival and chronic kidney disease. Interventions that target acute kidney disease may improve outcomes of patients with cirrhosis and acute kidney injury.


Assuntos
Injúria Renal Aguda , Insuficiência Renal Crônica , Doença Aguda , Injúria Renal Aguda/complicações , Injúria Renal Aguda/etiologia , Humanos , Rim , Cirrose Hepática/complicações , Cirrose Hepática/epidemiologia , Insuficiência Renal Crônica/complicações , Insuficiência Renal Crônica/epidemiologia , Fatores de Risco
6.
Liver Int ; 42(1): 187-198, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34779104

RESUMO

BACKGROUND & AIMS: Guidelines recommend albumin as the plasma-expander of choice for acute kidney injury (AKI) in cirrhosis. However, the impact of these recommendations on patient outcomes remains unclear. We aimed to determine the practice-patterns and outcomes associated with albumin use in a large, nationwide-US cohort of hospitalized cirrhotics with AKI. METHODS: A retrospective cohort study was performed in hospitalized cirrhotics with AKI using Cerner-Health-Facts database from January 2009 to March 2018. 6786 were included for analysis on albumin-practice-patterns, and 4126 had available outcomes data. Propensity-score-adjusted model was used to determine the association between albumin use, AKI-recovery and in-hospital survival. RESULTS: Median age was 61-years (60% male, 70% white), median serum-creatinine was 1.8 mg/dL and median Model for End-stage Liver Disease Sodium (MELD-Na) score was 24. Albumin was given to 35% of patients, of which 50% received albumin within 48-hours of AKI-onset, and 17% received appropriate weight-based dosing. Albumin was used more frequently in patients with advanced complications of cirrhosis, higher MELD-Na scores and patients admitted to urban-teaching hospitals. After propensity-matching and multivariable adjustment, albumin use was not associated with AKI-recovery (odds ratio [OR] 0.70, 95% confidence-interval [CI]: 0.59-1.07, P = .130) or in-hospital survival (OR 0.76 [95% CI: 0.46-1.25], P = .280), compared with crystalloids. Findings were unchanged in subgroup analyses of patients with varying cirrhosis complications and disease severity. CONCLUSIONS: USA hospitalized patients with cirrhosis and AKI frequently do not receive intravenous albumin, and albumin use was not associated with improved clinical outcomes. Prospective randomised trials are direly needed to evaluate the impact of albumin in cirrhotics with AKI.


Assuntos
Injúria Renal Aguda , Doença Hepática Terminal , Injúria Renal Aguda/etiologia , Albuminas/uso terapêutico , Doença Hepática Terminal/complicações , Feminino , Humanos , Cirrose Hepática/complicações , Cirrose Hepática/tratamento farmacológico , Masculino , Pessoa de Meia-Idade , Estudos Prospectivos , Estudos Retrospectivos , Fatores de Risco , Índice de Gravidade de Doença
7.
PLOS Digit Health ; 1(11): e0000130, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-36812596

RESUMO

Sepsis accounts for more than 50% of hospital deaths, and the associated cost ranks the highest among hospital admissions in the US. Improved understanding of disease states, progression, severity, and clinical markers has the potential to significantly improve patient outcomes and reduce cost. We develop a computational framework that identifies disease states in sepsis and models disease progression using clinical variables and samples in the MIMIC-III database. We identify six distinct patient states in sepsis, each associated with different manifestations of organ dysfunction. We find that patients in different sepsis states are statistically significantly composed of distinct populations with disparate demographic and comorbidity profiles. Our progression model accurately characterizes the severity level of each pathological trajectory and identifies significant changes in clinical variables and treatment actions during sepsis state transitions. Collectively, our framework provides a holistic view of sepsis, and our findings provide the basis for future development of clinical trials, prevention, and therapeutic strategies for sepsis.

8.
Front Neurosci ; 15: 549322, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33889066

RESUMO

Recent neuroimaging studies have shown that functional connectomes are unique to individuals, i.e., two distinct fMRIs taken over different sessions of the same subject are more similar in terms of their connectomes than those from two different subjects. In this study, we present new results that identify specific parts of resting state and task-specific connectomes that are responsible for the unique signatures. We show that a very small part of the connectome can be used to derive features for discriminating between individuals. A network of these features is shown to achieve excellent training and test accuracy in matching imaging datasets. We show that these features are statistically significant, robust to perturbations, invariant across populations, and are localized to a small number of structural regions of the brain. Furthermore, we show that for task-specific connectomes, the regions identified by our method are consistent with their known functional characterization. We present a new matrix sampling technique to derive computationally efficient and accurate methods for identifying the discriminating sub-connectome and support all of our claims using state-of-the-art statistical tests and computational techniques.

9.
Database (Oxford) ; 20202020 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-32294194

RESUMO

MOTIVATION: Biomolecular data stored in public databases is increasingly specialized to organisms, context/pathology and tissue type, potentially resulting in significant overhead for analyses. These networks are often specializations of generic interaction sets, presenting opportunities for reducing storage and computational cost. Therefore, it is desirable to develop effective compression and storage techniques, along with efficient algorithms and a flexible query interface capable of operating on compressed data structures. Current graph databases offer varying levels of support for network integration. However, these solutions do not provide efficient methods for the storage and querying of versioned networks. RESULTS: We present VerTIoN, a framework consisting of novel data structures and associated query mechanisms for integrated querying of versioned context-specific biological networks. As a use case for our framework, we study network proximity queries in which the user can select and compose a combination of tissue-specific and generic networks. Using our compressed version tree data structure, in conjunction with state-of-the-art numerical techniques, we demonstrate real-time querying of large network databases. CONCLUSION: Our results show that it is possible to support flexible queries defined on heterogeneous networks composed at query time while drastically reducing response time for multiple simultaneous queries. The flexibility offered by VerTIoN in composing integrated network versions opens significant new avenues for the utilization of ever increasing volume of context-specific network data in a broad range of biomedical applications. AVAILABILITY AND IMPLEMENTATION: VerTIoN is implemented as a C++ library and is available at http://compbio.case.edu/omics/software/vertion and https://github.com/tjcowman/vertion. CONTACT: tyler.cowman@case.edu.


Assuntos
Biologia Computacional/métodos , Bases de Dados Factuais , Redes Reguladoras de Genes , Mapas de Interação de Proteínas , Algoritmos , Curadoria de Dados/métodos , Mineração de Dados/métodos , Humanos , Internet , Interface Usuário-Computador
10.
IEEE Trans Inf Theory ; 66(8): 5003-5021, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33746243

RESUMO

The von Neumann entropy, named after John von Neumann, is an extension of the classical concept of entropy to the field of quantum mechanics. From a numerical perspective, von Neumann entropy can be computed simply by computing all eigenvalues of a density matrix, an operation that could be prohibitively expensive for large-scale density matrices. We present and analyze three randomized algorithms to approximate von Neumann entropy of real density matrices: our algorithms leverage recent developments in the Randomized Numerical Linear Algebra (RandNLA) literature, such as randomized trace estimators, provable bounds for the power method, and the use of random projections to approximate the eigenvalues of a matrix. All three algorithms come with provable accuracy guarantees and our experimental evaluations support our theoretical findings showing considerable speedup with small loss in accuracy.

11.
BMC Bioinformatics ; 20(1): 488, 2019 Oct 07.
Artigo em Inglês | MEDLINE | ID: mdl-31590652

RESUMO

BACKGROUND: The data deluge can leverage sophisticated ML techniques for functionally annotating the regulatory non-coding genome. The challenge lies in selecting the appropriate classifier for the specific functional annotation problem, within the bounds of the hardware constraints and the model's complexity. In our system AIKYATAN, we annotate distal epigenomic regulatory sites, e.g., enhancers. Specifically, we develop a binary classifier that classifies genome sequences as distal regulatory regions or not, given their histone modifications' combinatorial signatures. This problem is challenging because the regulatory regions are distal to the genes, with diverse signatures across classes (e.g., enhancers and insulators) and even within each class (e.g., different enhancer sub-classes). RESULTS: We develop a suite of ML models, under the banner AIKYATAN, including SVM models, random forest variants, and deep learning architectures, for distal regulatory element (DRE) detection. We demonstrate, with strong empirical evidence, deep learning approaches have a computational advantage. Plus, convolutional neural networks (CNN) provide the best-in-class accuracy, superior to the vanilla variant. With the human embryonic cell line H1, CNN achieves an accuracy of 97.9% and an order of magnitude lower runtime than the kernel SVM. Running on a GPU, the training time is sped up 21x and 30x (over CPU) for DNN and CNN, respectively. Finally, our CNN model enjoys superior prediction performance vis-'a-vis the competition. Specifically, AIKYATAN-CNN achieved 40% higher validation rate versus CSIANN and the same accuracy as RFECS. CONCLUSIONS: Our exhaustive experiments using an array of ML tools validate the need for a model that is not only expressive but can scale with increasing data volumes and diversity. In addition, a subset of these datasets have image-like properties and benefit from spatial pooling of features. Our AIKYATAN suite leverages diverse epigenomic datasets that can then be modeled using CNNs with optimized activation and pooling functions. The goal is to capture the salient features of the integrated epigenomic datasets for deciphering the distal (non-coding) regulatory elements, which have been found to be associated with functional variants. Our source code will be made publicly available at: https://bitbucket.org/cellsandmachines/aikyatan.


Assuntos
Mapeamento Cromossômico/métodos , Aprendizado Profundo , Epigenômica/métodos , Sequências Reguladoras de Ácido Nucleico , Software , Linhagem Celular , Humanos
12.
Sci Rep ; 9(1): 3057, 2019 02 28.
Artigo em Inglês | MEDLINE | ID: mdl-30816140

RESUMO

The problem of reverse-engineering the evolution of a dynamic network, known broadly as network archaeology, is one of profound importance in diverse application domains. In analysis of infection spread, it reveals the spatial and temporal processes underlying infection. In analysis of biomolecular interaction networks (e.g., protein interaction networks), it reveals early molecules that are known to be differentially implicated in diseases. In economic networks, it reveals flow of capital and associated actors. Beyond these recognized applications, it provides analytical substrates for novel studies - for instance, on the structural and functional evolution of the human brain connectome. In this paper, we model, formulate, and rigorously analyze the problem of inferring the arrival order of nodes in a dynamic network from a single snapshot. We derive limits on solutions to the problem, present methods that approach this limit, and demonstrate the methods on a range of applications, from inferring the evolution of the human brain connectome to conventional citation and social networks, where ground truth is known.


Assuntos
Algoritmos , Disseminação de Informação , Análise de Sistemas , Conectoma , Humanos , Redes Sociais Online , Mapas de Interação de Proteínas
13.
Brief Bioinform ; 20(1): 235-244, 2019 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-28968781

RESUMO

Federation is a popular concept in building distributed cyberinfrastructures, whereby computational resources are provided by multiple organizations through a unified portal, decreasing the complexity of moving data back and forth among multiple organizations. Federation has been used in bioinformatics only to a limited extent, namely, federation of datastores, e.g. SBGrid Consortium for structural biology and Gene Expression Omnibus (GEO) for functional genomics. Here, we posit that it is important to federate both computational resources (CPU, GPU, FPGA, etc.) and datastores to support popular bioinformatics portals, with fast-increasing data volumes and increasing processing requirements. A prime example, and one that we discuss here, is in genomics and metagenomics. It is critical that the processing of the data be done without having to transport the data across large network distances. We exemplify our design and development through our experience with metagenomics-RAST (MG-RAST), the most popular metagenomics analysis pipeline. Currently, it is hosted completely at Argonne National Laboratory. However, through a recently started collaborative National Institutes of Health project, we are taking steps toward federating this infrastructure. Being a widely used resource, we have to move toward federation without disrupting 50 K annual users. In this article, we describe the computational tools that will be useful for federating a bioinformatics infrastructure and the open research challenges that we see in federating such infrastructures. It is hoped that our manuscript can serve to spur greater federation of bioinformatics infrastructures by showing the steps involved, and thus, allow them to scale to support larger user bases.


Assuntos
Genômica/estatística & dados numéricos , Disseminação de Informação/métodos , Big Data , Biologia Computacional/métodos , Confidencialidade , Bases de Dados Genéticas/estatística & dados numéricos , Privacidade Genética , Humanos , Metagenômica/estatística & dados numéricos , Software , Estados Unidos
14.
Brief Bioinform ; 20(4): 1151-1159, 2019 07 19.
Artigo em Inglês | MEDLINE | ID: mdl-29028869

RESUMO

As technologies change, MG-RAST is adapting. Newly available software is being included to improve accuracy and performance. As a computational service constantly running large volume scientific workflows, MG-RAST is the right location to perform benchmarking and implement algorithmic or platform improvements, in many cases involving trade-offs between specificity, sensitivity and run-time cost. The work in [Glass EM, Dribinsky Y, Yilmaz P, et al. ISME J 2014;8:1-3] is an example; we use existing well-studied data sets as gold standards representing different environments and different technologies to evaluate any changes to the pipeline. Currently, we use well-understood data sets in MG-RAST as platform for benchmarking. The use of artificial data sets for pipeline performance optimization has not added value, as these data sets are not presenting the same challenges as real-world data sets. In addition, the MG-RAST team welcomes suggestions for improvements of the workflow. We are currently working on versions 4.02 and 4.1, both of which contain significant input from the community and our partners that will enable double barcoding, stronger inferences supported by longer-read technologies, and will increase throughput while maintaining sensitivity by using Diamond and SortMeRNA. On the technical platform side, the MG-RAST team intends to support the Common Workflow Language as a standard to specify bioinformatics workflows, both to facilitate development and efficient high-performance implementation of the community's data analysis tasks.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Metagenoma , Metagenômica/métodos , Software , Algoritmos , Orçamentos , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/economia , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Internet , Metagenômica/economia , Metagenômica/estatística & dados numéricos , Análise de Sequência de DNA/economia , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/estatística & dados numéricos , Interface Usuário-Computador , Fluxo de Trabalho
15.
IEEE/ACM Trans Comput Biol Bioinform ; 15(4): 1037-1051, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29993641

RESUMO

BACKGROUND: MicroRNAs (miRNAs) are approximately 22-nucleotide long regulatory RNA that mediate RNA interference by binding to cognate mRNA target regions. Here, we present a distributed kernel SVM-based binary classification scheme to predict miRNA targets. It captures the spatial profile of miRNA-mRNA interactions via smooth B-spline curves. This is accomplished separately for various input features, such as thermodynamic and sequence-based features. Further, we use a principled approach to uniformly model both canonical and non-canonical seed matches, using a novel seed enrichment metric. Finally, we verify our miRNA-mRNA pairings using an Elastic Net-based regression model on TCGA expression data for four cancer types to estimate the miRNAs that together regulate any given mRNA. RESULTS: We present a suite of algorithms for miRNA target prediction, under the banner Avishkar, with superior prediction performance over the competition. Specifically, our final kernel SVM model, with an Apache Spark backend, achieves an average true positive rate (TPR) of more than 75 percent, when keeping the false positive rate of 20 percent, for non-canonical human miRNA target sites. This is an improvement of over 150 percent in the TPR for non-canonical sites, over the best-in-class algorithm. We are able to achieve such superior performance by representing the thermodynamic and sequence profiles of miRNA-mRNA interaction as curves, devising a novel seed enrichment metric, and learning an ensemble of miRNA family-specific kernel SVM classifiers. We provide an easy-to-use system for large-scale interactive analysis and prediction of miRNA targets. All operations in our system, namely candidate set generation, feature generation and transformation, training, prediction, and computing performance metrics are fully distributed and are scalable. CONCLUSIONS: We have developed an efficient SVM-based model for miRNA target prediction using recent CLIP-seq data, demonstrating superior performance, evaluated using ROC curves for different species (human or mouse), or different target types (canonical or non-canonical). We analyzed the agreement between the target pairings using CLIP-seq data and using expression data from four cancer types. To the best of our knowledge, we provide the first distributed framework for miRNA target prediction based on Apache Hadoop and Spark. AVAILABILITY: All source code and sample data are publicly available at https://bitbucket.org/cellsandmachines/avishkar. Our scalable implementation of kernel SVM using Apache Spark, which can be used to solve large-scale non-linear binary classification problems, is available at https://bitbucket.org/cellsandmachines/kernelsvmspark.


Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , MicroRNAs/genética , Algoritmos , Bases de Dados Genéticas , Humanos , MicroRNAs/análise , MicroRNAs/metabolismo , Curva ROC , Reprodutibilidade dos Testes , Alinhamento de Sequência/métodos , Análise de Sequência de RNA/métodos , Máquina de Vetores de Suporte
16.
Nat Commun ; 9(1): 1516, 2018 04 17.
Artigo em Inglês | MEDLINE | ID: mdl-29666373

RESUMO

Single-cell transcriptomic data has the potential to radically redefine our view of cell-type identity. Cells that were previously believed to be homogeneous are now clearly distinguishable in terms of their expression phenotype. Methods for automatically characterizing the functional identity of cells, and their associated properties, can be used to uncover processes involved in lineage differentiation as well as sub-typing cancer cells. They can also be used to suggest personalized therapies based on molecular signatures associated with pathology. We develop a new method, called ACTION, to infer the functional identity of cells from their transcriptional profile, classify them based on their dominant function, and reconstruct regulatory networks that are responsible for mediating their identity. Using ACTION, we identify novel Melanoma subtypes with differential survival rates and therapeutic responses, for which we provide biomarkers along with their underlying regulatory networks.


Assuntos
Diferenciação Celular/genética , Perfilação da Expressão Gênica/métodos , Modelos Genéticos , Análise de Célula Única/métodos , Transcriptoma/fisiologia , Animais , Biomarcadores Tumorais/genética , Linhagem Celular Tumoral , Conjuntos de Dados como Assunto , Redes Reguladoras de Genes/fisiologia , Humanos , Melanoma/genética , Melanoma/terapia , Camundongos , Fenótipo , Taxa de Sobrevida , Resultado do Tratamento , Microambiente Tumoral/genética
17.
IEEE/ACM Trans Comput Biol Bioinform ; 14(6): 1378-1388, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-27362987

RESUMO

Next generation sequencing technologies enable efficient and cost-effective genome sequencing. However, sequencing errors increase the complexity of the de novo assembly process, and reduce the quality of the assembled sequences. Many error correction techniques utilizing substring frequencies have been developed to mitigate this effect. In this paper, we present a novel and effective method called Pluribus, for correcting sequencing errors using a generalized suffix trie. Pluribus utilizes multiple manifestations of an error in the trie to accurately identify errors and suggest corrections. We show that Pluribus produces the least number of false positives across a diverse set of real sequencing datasets when compared to other methods. Furthermore, Pluribus can be used in conjunction with other contemporary error correction methods to achieve higher levels of accuracy than either tool alone. These increases in error correction accuracy are also realized in the quality of the contigs that are generated during assembly. We explore, in-depth, the behavior of Pluribus , to explain the observed improvement in accuracy and assembly performance. Pluribus is freely available at http://compbio. CASE: edu/pluribus/.


Assuntos
Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Bases de Dados Genéticas
18.
IEEE/ACM Trans Comput Biol Bioinform ; 14(6): 1446-1458, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-27483461

RESUMO

Network alignment has extensive applications in comparative interactomics. Traditional approaches aim to simultaneously maximize the number of conserved edges and the underlying similarity of aligned entities. We propose a novel formulation of the network alignment problem that extends topological similarity to higher-order structures and provides a new objective function that maximizes the number of aligned substructures. This objective function corresponds to an integer programming problem, which is NP-hard. Consequently, we identify a closely related surrogate function whose maximization results in a tensor eigenvector problem. Based on this formulation, we present an algorithm called Triangular AlignMEnt (TAME), which attempts to maximize the number of aligned triangles across networks. Using a case study on the NAPAbench dataset, we show that triangular alignment is capable of producing mappings with high node correctness. We further evaluate our method by aligning yeast and human interactomes. Our results indicate that TAME outperforms the state-of-art alignment methods in terms of conserved triangles. In addition, we show that the number of conserved triangles is more significantly correlated, compared to the conserved edge, with node correctness and co-expression of edges. Our formulation and resulting algorithms can be easily extended to arbitrary motifs.


Assuntos
Algoritmos , Biologia Computacional/métodos , Mapeamento de Interação de Proteínas/métodos , Alinhamento de Sequência/métodos , Perfilação da Expressão Gênica , Humanos , Leveduras/genética , Leveduras/metabolismo
19.
Sci Rep ; 6: 38433, 2016 12 08.
Artigo em Inglês | MEDLINE | ID: mdl-27929098

RESUMO

We present EP-DNN, a protocol for predicting enhancers based on chromatin features, in different cell types. Specifically, we use a deep neural network (DNN)-based architecture to extract enhancer signatures in a representative human embryonic stem cell type (H1) and a differentiated lung cell type (IMR90). We train EP-DNN using p300 binding sites, as enhancers, and TSS and random non-DHS sites, as non-enhancers. We perform same-cell and cross-cell predictions to quantify the validation rate and compare against two state-of-the-art methods, DEEP-ENCODE and RFECS. We find that EP-DNN has superior accuracy with a validation rate of 91.6%, relative to 85.3% for DEEP-ENCODE and 85.5% for RFECS, for a given number of enhancer predictions and also scales better for a larger number of enhancer predictions. Moreover, our H1 → IMR90 predictions turn out to be more accurate than IMR90 → IMR90, potentially because H1 exhibits a richer signature set and our EP-DNN model is expressive enough to extract these subtleties. Our work shows how to leverage the full expressivity of deep learning models, using multiple hidden layers, while avoiding overfitting on the training data. We also lay the foundation for exploration of cross-cell enhancer predictions, potentially reducing the need for expensive experimentation.


Assuntos
Cromatina/genética , Biologia Computacional , Elementos Facilitadores Genéticos/genética , Redes Neurais de Computação , Algoritmos , Células-Tronco Embrionárias Humanas/citologia , Humanos , Pulmão/citologia
20.
BMC Syst Biol ; 10 Suppl 2: 54, 2016 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-27490187

RESUMO

BACKGROUND: Gene expression is mediated by specialized cis-regulatory modules (CRMs), the most prominent of which are called enhancers. Early experiments indicated that enhancers located far from the gene promoters are often responsible for mediating gene transcription. Knowing their properties, regulatory activity, and genomic targets is crucial to the functional understanding of cellular events, ranging from cellular homeostasis to differentiation. Recent genome-wide investigation of epigenomic marks has indicated that enhancer elements could be enriched for certain epigenomic marks, such as, combinatorial patterns of histone modifications. METHODS: Our efforts in this paper are motivated by these recent advances in epigenomic profiling methods, which have uncovered enhancer-associated chromatin features in different cell types and organisms. Specifically, in this paper, we use recent state-of-the-art Deep Learning methods and develop a deep neural network (DNN)-based architecture, called EP-DNN, to predict the presence and types of enhancers in the human genome. It uses as features, the expression levels of the histone modifications at the peaks of the functional sites as well as in its adjacent regions. We apply EP-DNN to four different cell types: H1, IMR90, HepG2, and HeLa S3. We train EP-DNN using p300 binding sites as enhancers, and TSS and random non-DHS sites as non-enhancers. We perform EP-DNN predictions to quantify the validation rate for different levels of confidence in the predictions and also perform comparisons against two state-of-the-art computational models for enhancer predictions, DEEP-ENCODE and RFECS. RESULTS: We find that EP-DNN has superior accuracy and takes less time to make predictions. Next, we develop methods to make EP-DNN interpretable by computing the importance of each input feature in the classification task. This analysis indicates that the important histone modifications were distinct for different cell types, with some overlaps, e.g., H3K27ac was important in cell type H1 but less so in HeLa S3, while H3K4me1 was relatively important in all four cell types. We finally use the feature importance analysis to reduce the number of input features needed to train the DNN, thus reducing training time, which is often the computational bottleneck in the use of a DNN. CONCLUSIONS: In this paper, we developed EP-DNN, which has high accuracy of prediction, with validation rates above 90 % for the operational region of enhancer prediction for all four cell lines that we studied, outperforming DEEP-ENCODE and RFECS. Then, we developed a method to analyze a trained DNN and determine which histone modifications are important, and within that, which features proximal or distal to the enhancer site, are important.


Assuntos
Biologia Computacional/métodos , Elementos Facilitadores Genéticos/genética , Redes Neurais de Computação , Linhagem Celular Tumoral , Regulação da Expressão Gênica , Histonas/metabolismo , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA